AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Image-text interaction

# Image-text interaction

Smolvlm Instruct GGUF
Apache-2.0
SmolVLM is a compact open-source multimodal model that can accept image and text inputs and generate text outputs. It is designed for high efficiency and is suitable for device-side applications.
Image-to-Text Transformers English
S
Mungert
1,023
2
Gemma 3 12b It Qat Compressed Tensors
Gemma 3 is Google's lightweight cutting-edge open model family, built on the same research and technology used to create Gemini models. This model is multimodal, capable of processing both text and image inputs to generate text outputs.
Text-to-Image
G
gaunernst
867
1
Qwen2.5 VL 32B Instruct GGUF
Apache-2.0
Qwen2.5-VL-32B-Instruct is a 32B-parameter multimodal vision-language model that supports joint understanding and generation tasks for images and text.
Image-to-Text English
Q
Mungert
9,766
6
Gemma 3 27b It Qat Q4 0 Gguf
Gemma is a lightweight open-source multimodal model series launched by Google. It supports text and image inputs and generates text outputs. It has a 128K large context window and supports over 140 languages.
Image-to-Text
G
google
69.29k
251
Qwen2 VL 2B Instruct
Apache-2.0
Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text-to-text tasks.
Image-to-Text Transformers English
Q
FriendliAI
24
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase